Jaccard similarity

Terms from Artificial Intelligence: humans at the heart of algorithms

Page numbers are for draft copy at present; they will be replaced with correct numbers when final book is formatted. Chapter numbers are correct and will not change now.

Jaccard similarity is a measure of similarity between two doucuments. The Jaccard similarity to compare two documents, doc1 and doc2, works on the bag of words (say words1 and words2), and then calculates
     | words1 ∩ words2 | / | words1 ∪ words2 |
That is the number of distinct words in both documents divided by the number of distinct words in the union. The Jaccard similarity is used heavily in document retrieval algorithms.

Defined on page 213

Used on Chap. 10: page 213; Chap. 18: page 446